On the Power of Semantic Partitioning of Web Documents
نویسندگان
چکیده
A growing number of Web sites are maintained by content management software and thus a large number of Web pages are machine-generated via templates. Normally in such Web pages there is implicitly a fixed “schema” and what changes is the content. Informally a schema for a Web page represents concepts and relationships among them in a hierarchical fashion. For example, Figure 1 is a screen shot of the New York Times front page (see http://www.nytimes.com). Observe that this page includes: (i) a taxonomy of items such as “NEWS” (consisting of hyperlinks labeled with “International”, “National”, ...), “OPINION” (consisting of hyperlinks “Editorial/Op-Ed”, ...), etc.; (ii) several headlines of news articles where each article begins with a hyperlink labeled with the news headline (e.g., “Bush tells Nation ...”) followed by the author of the article (e.g., “By Richard W. Stevenson ...”), followed by a time-stamp and a text summary of the article (e.g., “President Bush portrayed ...”). The schema for this fragment of the New York Times front page therefore includes the taxonomy (which does not change) and the template for the news article. We should point out that the schema will also include several additional elements pertaining to other content appearing in the page.
منابع مشابه
Query expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملCentralized Clustering Method To Increase Accuracy In Ontology Matching Systems
Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...
متن کاملAn Executive Approach Based On the Production of Fuzzy Ontology Using the Semantic Web Rule Language Method (SWRL)
Today, the need to deal with ambiguous information in semantic web languages is increasing. Ontology is an important part of the W3C standards for the semantic web, used to define a conceptual standard vocabulary for the exchange of data between systems, the provision of reusable databases, and the facilitation of collaboration across multiple systems. However, classical ontology is not enough ...
متن کاملAHP Techniques for Trust Evaluation in Semantic Web
The increasing reliance on information gathered from the web and other internet technologies raise the issue of trust. Through the development of semantic Web, One major difficulty is that, by its very nature, the semantic web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give each resource. Each user knows the trustworthiness of ...
متن کاملA procedure for Web Service Selection Using WS-Policy Semantic Matching
In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...
متن کاملAHP Techniques for Trust Evaluation in Semantic Web
The increasing reliance on information gathered from the web and other internet technologies raise the issue of trust. Through the development of semantic Web, One major difficulty is that, by its very nature, the semantic web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give each resource. Each user knows the trustworthiness of ...
متن کامل